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[57] ABSTRACT 


A speech-based system for assessing the psychological, 
physiological, or other characteristics of a test subject is 
described. The system includes a knowledge base that stores 
one or more speech models, where each speech model 
corresponds to a characteristic of a group of reference 
subjects. Signal processing circuitry, which may be imple- 
mented in hardware, software and/or firmware, compares the 
test speech parameters of a test subject with the speech 
models. In one embodiment, each speech model is repre- 
sented by a statistical time-ordered series of frequency 
representations of the speech of the reference subjects. The 
speech model is independent of a priori knowledge of style 
parameters associated with the voice or speech. The system 
includes speech parameterization circuitry for generating the 
test parameters in response to the test subject’s speech. This 
circuitry includes speech acquisition circuitry, which may be 
located remotely from the knowledge base. The system 
further includes output circuitry for outputting at least one 
indicator of a characteristic in response to the comparison 
performed by the signal processing circuitry. The character- 
istic may be time-varying, in which case the output circuitry 
outputs the characteristic in a time-varying manner. The 
output circuitry also may output a ranking of each output 
characteristic. In one embodiment, one or more character- 
istics may indicate the degree of sincerity of the test subject, 
where the degree of sincerity may vary with time. The 
system may also be employed to determine the effectiveness 
of treatment for a psychological or physiological disorder by 
comparing psychological or physiological characteristics, 
respectively, before and after treatment. 
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SPEECH SIGNAL PROCESSING FOR 
DETERMINING PSYCHOLOGICAL OR 
PHYSIOLOGICAL CHARACTERISTICS 

USING A KNOWLEDGE BASE 


BACKGROUND 


1. Field of the Invention 


The present invention relates to the field of speech 
analysis, and in particular to the analysis of an individual’s 
speech to determine psychological, physiological or other 
characteristics. 

2. Description of the Related Art 

Scientists have long known that qualities of the human 
voice may indicate the emotions of the speaker. Speech is 
the acoustic response to motion of the vocal cords and the 
vocal tract, and to the resonances of openings and cavities of 
the human head. Air pressure from the lungs is modulated by 
muscular tension of the vocal cords, among other influences. 
Human emotions, as well as certain physiological conditions 
not typically associated with the voice, affect this muscular 
tension, and thereby affect voice modulation. Further, 
speech may also be affected by certain physiological 
conditions, such as dementia, learning disabilities, and vari- 
ous organically-based speech and language disorders. 

Others have attempted to associate emotional qualities 
quantitatively with physical speech characteristics. In U.S. 
Pat. No. 3,855,417, issued to Fuller, the normalized peak 
energy ratio from two frequency bands of a subject’s voice 
is used to determine whether the subject is telling the truth. 
In U.S. Pat. No. 3,855,416, issued to Fuller, a skilled 
interrogator asks the subject questions designed to elicit a 
true or false response. Fuller’s system weighs a measure of 
the vibrato content of the subject’s speech with the peak 
amplitude from a selected frequency band. The interrogator 
derives the veracity of the subject’s statement through a 
comparison of the resulting quantity with a known truthful 
response. 

In U.S. Pat. No. 4,093,821, issued to Williamson, a speech 
analyzer operates on the frequency components within the 
first formant band of a subject’s speech. The analyzer 
examines occurrence patterns in differential first formant 
pitch, rate of change of pitch, duration, and time distribution. 
The analyzer produces three outputs. The first output indi- 
cates the frequency of nulls or “flat” spots in a 
FM-demodulated first-formant speech signal. Williamson 
discloses that small differences in frequency between short 
adjacent nulls indicate stress, and that large differences in 
frequency between adjacent nulls indicate relaxation. The 
second output indicates the duration of the nulls. According 
to Williamson, the longer the nulls, the higher the stress 
level. The third output is proportional to (1) the ratio of the 
total duration of nulls during a word period to (2) the total 
length of the word period. According to Williamson, an 
operator can determine the emotional state of an individual 
based upon these three outputs. 

US. Pat. No. 5,148,483, issued to Silverman, describes a 
method for detecting suicidal predisposition based upon 
speech. The voice analyzer examines the signal amplitude 
decay at the conclusion of an utterance by a test subject, and 
the degree of amplitude modulation of the utterance. The 
subject's speech is filtered and displayed on a time-domain 
strip chart recording. A strip chart recording of a similarly 
filtered speech signal from a mentally healthy person is 
obtained. A skilled operator compares the parameters of 
interest from these two strip charts to determine whether the 
test subject is predisposed to suicide. 
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U.S. Pat. No. 4,490,840, issued to Jones, is based upon a 
relationship between so-called *perceptual dimensions" and 
seven “vocal profile dimensions." The seven vocal dimen- 
sions include two voice and five speech dimensions, namely: 
resonance, quality, variability-monotone, choppy-smooth, 
staccato-sustain, attack-soft, and affectivity-control. The 
voice, speech and perceptual dimensions require assembly 
from 14 specific properties representative of the voice signal 
in the frequency domain, plus four arithmetic relationships 
among those properties, plus the average differences 
between several hundred consecutive samples in the time 
domain. To arrive at voice style “quality” elements, the 
system relies upon relationships between the lower set and 
the upper set of frequencies in the vocal utterance. The 
speech style elements, on the other hand, are determined by 
a combination of measurements relating to the pattern of 
vocal energy occurrences such as pauses and decay rates. 
The voice style “quality” elements emerge from three spec- 
tral analysis functions, whereas the speech style elements 
result from four other analysis functions. The voice style 
quality analysis elements include spectrum spread, spectrum 
energy balance, and spectrum envelope flatness. The speech 
style elements are spectrum variability, utterance pause ratio 
analysis, syllable change approximation, and high frequency 
analysis. 

Jones relates the seven vocal dimensions and seven per- 
ceptual style dimensions only to the above-described sound 
style elements. Each dimension is described as a function of 
these selected sound style elements. According to Jones’s 
theory, the seven perceptual style dimensions or even dif- 
ferent perceptual, personality or cognitive dimensions can 
be described as a function of the seven sound style elements. 


The limitation in the Jones system to seven speech 
elements apparently constrains the psychological character- 
istics that can be measured by the system. Jones states that 
“[t]he presence of specific emotional content such as fear, 
stress, or anxiety, or the probability of lying on specific 
words, is not of interest to the invention disclosed herein.” 
Col. 5, lines 42-45. 


Each prior art voice analyzer generally relies upon one or 
more highly specific frequency or time characteristics, or a 
combination thereof, in order to derive the emotional state of 
the speaker. None of the references provides flexibility in the 
frequency or time domain qualities that are analyzed. Jones 
allows a variation in the weighting of the seven sound style 
elements, but does not permit variation of the elements 
themselves. Further, all the known prior art characterizations 
of speech rely upon a priori knowledge of speech patterns, 
such as knowledge of vibrato content, properties of speech 
within the first formant, amplitude decay properties, 
staccato-sustain and attack-soft. The prior art does not 
contemplate allowing a flexible variation of the disclosed 
specific time and frequency qualities even though such a 
variation may enable a speech-based assessment to correlate 
strongly with traditional psychological assessments, such as 
the Myers Briggs test and MMPI. Such flexibility is highly 
desirable given that the psychological profile of an indi- 
vidual is already difficult to quantify. Further, it is desirable 
to provide a speech analysis system that can also be easily 
adapted to assessing physiological traits of an individual. 


SUMMARY OF THE INVENTION 


The present invention provides a speech-based system for 
assessing psychological, physiological or other characteris- 
tics of a test subject. The system includes a knowledge base 
that stores one or more speech models, where each speech 
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model corresponds to а characteristic of а group of reference 
subjects. Signal processing circuitry, which may be imple- 
mented in hardware, software and/or firmware, compares the 
test speech parameters of a test subject with the speech 
models. In one embodiment, each speech model is repre- 
sented by a statistical time-ordered series of frequency 
representations of the speech of the reference subjects. The 
speech model is independent of a priori knowledge of style 
parameters associated with the voice or speech. The system 
includes speech parameterization circuitry for generating the 
test parameters in response to the test subject’s speech. The 
speech parameterization circuitry includes speech acquisi- 
tion circuitry, which may be located remotely from the 
knowledge base. The system further includes output cir- 
cuitry for outputting at least one indicator of a characteristic 
in response to the comparison performed by the signal 
processing circuitry. The characteristic may be time-varying, 
in which case the output circuitry outputs the characteristic 
in a time-varying manner. The output circuitry also may 
output a ranking of each output characteristic. In one 
embodiment, one or more characteristics may indicate the 
degree of sincerity of the test subject, where the degree of 
sincerity may vary with time. The system may also be 
employed to determine the effectiveness of treatment for a 
psychological or physiological disorder by comparing psy- 
chological or physiological characteristics, respectively, 
before and after treatment. 


BRIEF DESCRIPTION OF THE DRAWINGS 


FIG. 1 is a simple block diagram illustrating the speech- 
based assessment system of the present invention. 

FIG. 2 is a functional block diagram illustrating the 
functions performed by the structure of FIG. 1. 

FIG. 3 is a block diagram illustrating one embodiment of 
a speech parameterization process employed by the present 
invention. 

FIG. 4 is a simplified two-dimensional representation of 
an embodiment of the knowledge base employed by the 
present invention. 


FIGS. 5а-5х illustrate a knowledge base for the Luscher 
color test. 


FIG. 6 illustrates an inventive sonogram display illustrat- 
ing time-dependent psychological or physiological charac- 
teristics of the speaker. 

FIG. 7 illustrates the SOCION matrix employed by one 
embodiment of the present invention. 


DETAILED DESCRIPTION OF THE 
INVENTION 


The present invention provides a method and apparatus 
for speech-based psychological or physiological assessment. 
In the following description, numerous details are set forth 
in order to enable a thorough understanding of the present 
invention. However, it will be understood by those of 
ordinary skill in the art that these specific details are not 
required in order to practice the invention. Further, well- 
known elements, devices, process steps and the like are not 
set forth in detail in order to avoid obscuring the present 
invention. 

FIG. 1 is a simple block diagram illustrating the present 
invention. The system includes a microphone input 100 to 
speech acquisition circuitry 102, such as a SOUND 
BLASTER sound card manufactured by Creative Labs. The 
sound card outputs speech data to a CPU 104, which stores 
speech information in memory 106. Adisplay 108 is coupled 
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to the CPU to display psychological or physiological char- 
acteristics determined in response to the speech of a test 
subject speaking into the microphone. 

FIG. 2 is a functional block diagram illustrating the 
functions performed by the structure of FIG. 1. Aknowledge 
base 200 stored in memory 106 stores speech parameters 
that are associated with particular psychological or physi- 
ological characteristics. The speech of a test subject is 
correlated with the speech parameters in the knowledge base 
200 by first parameterizing the test subject’s speech 202, and 
then determining the degree of similarity 204 between the 
test subject’s speech parameters and the speech parameters 
in the knowledge base 200. The psychological or physi- 
ological characteristics associated with the speech param- 
eters in the knowledge base that correlate most highly with 
the test subject’s speech parameters are displayed on the 
display 108. The speech parameterization takes place in the 
speech acquisition circuitry 102, which digitizes the speech, 
and in the CPU 104, which converts the digitized speech 
samples into speech parameters, as described below. The 
comparison 204 is carried out by the CPU 104. Of course, 
those skilled in the art will recognize that the circuitry of the 
present invention may be implemented in hardware, 
software, firmware and/or other programmed logic. 
Knowledge Base 

The knowledge base contains speech parameters that are 
correlated with psychological or physiological characteris- 
tics. The knowledge base is created by forming statistically 
large groups of people, where each group exhibits the same 
psychological or physiological characteristic. A larger super- 
set of people is divided into these psychologically or physi- 
ologically homogeneous groups by conducting a psycho- 
logical or physiological assessment, respectively, of the 
superset. As will become apparent from the description 
below, the present invention may be adapted to use any 
psychological or physiological test. For convenience, much 
of the description below concerns psychological 
characteristics, although those skilled in the art will recog- 
nize that the invention may easily be adapted to measure 
physiological characteristics. 

Regardless of the test employed, formation of the knowl- 
edge base requires two basic steps. First, psychologically 
homogeneous groups are formed based upon a psychologi- 
cal assessment, described below. Second, the speech param- 
eters most closely associated with each group are deter- 
mined. To perform this step, each subject (“reference 
subject”) in each group speaks into the microphone. Each 
subject’s speech is then parameterized. The process for 
parameterizing both the reference subjects’ speech to create 
the knowledge base, and the test subject’s speech for the 
later pattern comparison are illustrated in FIG. 3. The speech 
parameters for all the subjects in a group are collected. The 
collected parameters are divided into clusters. The statistics 
of the resulting clusters represent the corresponding psycho- 
logically homogeneous groups. These cluster statistics are 
later compared to the speech parameters of a test subject in 
order to determine the likelihood that the subject falls within 
each psychologically homogeneous group. The formation of 
the knowledge base using the cluster statistics is performed 
off-line for use in such subsequent testing. 

To digitize the speech, the sound card 102 samples the 
sound at a rate of 16,000 16-bit samples per second or at 32 
KB/s. Each subject speaks into the microphone 100 for at 
least two to three minutes. The subject is instructed to speak 
continuously in a normal tone of voice at a normal speaking 
volume without singing, counting or yelling. Although not 
necessary, each reference subject may be instructed to speak 
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the same words. The digitized speech samples from each 
reference subject are stored in memory, e.g., hard disk. 

The CPU 104 reads this data to generate 30 phrases as 
follows. The CPU 104 detects pauses in the speech using 
standard techniques. For example, a pause may be indicated 
when the amplitude of a speech sample drops below five 
times the amplitude of the background noise. The CPU 104 
then determines whether 6,720 samples after the pause occur 
before the next pause. If so, those samples are denoted a 
phrase. Thirty such phrases, each beginning after a pause, 
are categorized as such by the CPU 104. The CPU 104 
divides each phrase into eight states of 840 samples each 
(300). 

Using well known speech processing techniques, each 
state is parameterized. For example, the present invention 
may employ the linear predictive coding (LPC) techniques 
described in Chapter 3 of L. Rabiner, B. Juang, Fundamen- 
tals of Speech Recognition, Prentice Hall, 1993 (“Rabiner”). 
The entire text of the Rabiner book is incorporated by 
reference herein. See especially Section 3.37 and FIG. 3.3.7. 

FIG. 3 illustrates the LPC processing steps implemented 
by the CPU 104. Each state, s(i), is put through a low-order 
digital system 302 (typically a first-order FIR filter) to 
spectrally flatten the signal and make it less susceptible to 
finite precision effects later in the signal processing. This 
preemphasis is either fixed or slowly adaptive (e.g., to 
average transmission conditions, noise background, etc.). 
Rabiner uses the preemphasis filter 


H(2=1-az 1, where 0.9€a € 1.0. 


As a result, the output of the preemphasis filter, s'(i), is 
related to the input to the filter s(1) by the difference equation 


5(0-50)-6 s(i-1). 


A common value for a=0.95. 

The preemphasized signal s(1) is then blocked into frames, 
x(n), where n=0, 1,..., N-1, 1=0, 1, . . . , L-1 (304). Each 
frame consists of N speech samples, and each state com- 
prises L frames. The frames are separated by M samples. 

The next step requires that each frame be windowed to 
minimize the high frequency components caused by the 
discontinuities at the beginning and end of each frame (306). 
In one embodiment, each state is 840 samples long, com- 
prising L=5 frames of N2360 samples that overlap by 240 
samples so their adjacent frames are separated by M-120 
samples. 

The result of windowing is the signal 


X (n)=x(n)w(n) 0€nEN-1 
where typically the Hamming window 
w(n)=0.54-0.46 cos (27 n/(N-1)) OSnSN-1 


is used. 

This window is first applied to samples 0 through 359 of 
the state, then 120 through 479, then 240 through 599 and so 
on until five windowed frames for each state are generated. 
As will be seen below, the center windowed frame (1=2) will 
be used in computing the cepstral coefficients, whereas the 
other windowed frames will be employed in calculating the 
temporal cepstral derivative coefficients, 1.e., the delta cep- 
stral vector. 

The present invention characterizes the speech states 
using cepstral coefficients, which are derived from the 
standard LPC coefficients. The cepstral coefficients provide 
a useful and compact characterization of speech. As an 
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intermediate step, each center frame of the windowed signal 
is autocorrelated to give 


М-1-т 


nm = >》 Կայխոչո) 


п-0 


where 122 and m=0, 1, . . . , p, and p is the highest order of 
the autocorrelation analysis (308). Typically, p ranges from 
8 to 16. As an example, the Inventors have used p=11. The 
zeroth autocorrelation, r/(0), is the energy of the [ከ frame. 

The autocorrelation is employed to compute the linear 
prediction coefficients a,, of the following recursion 
equation, which provides a good approximation of the vocal 
tract 


թ 
X (n) = >. aysXi(n — m) p = 11 


m=1 


The LPC coefficients are determined by converting the 
autocorrelation coefficients using a technique known as 
Durbin's method, which is basically the same as the 
Cholesky decomposition (310). Durbin’s method may be 
implemented by the following algorithm (for convenience, 
the subscript 1 on rm) is omitted). 

Initialize 


E®=r(0) 
ko=-r(1)/r(0) 

a, Dk 

EO (1-k,?)g 


Then recursively compute 


т 
kin = Jo +1)+ 3 af? rim + 1- 中 /mm 
ісі 


የን Dace k Ch cy 


78+1-- 
ала Ձե, 
が の rg の (1- た ん 2) 


for 1 251 ,مم ع‎ 1=m=p-1. Тһе results of these calculations are 
the linear prediction coefficients a,,=a,,” for 1 和 msp， 
where the parenthetical superscript refers to the iteration 
number. The cepstral coefficients, ورين‎ are computed from the 
LPC coefficients as follows (312). The cepstral coefficients 
characterize the cepstrum. 


m-l /《 
Cm Gan +> (кен 1 < が < の 
ነ” 


The zeroth cepstral coefficient is the energy of the center 
frame (representing the energy of the state) and is given by 
10 log」。 r(0). 

As described in Rabiner, the cepstral coefficients are then 
weighted to minimize the sensitivity of the low-order cep- 
stral coefficients to overall spectral slope and the sensitivity 
of the high-order cepstral coefficients to noise, as follows 
(314). 
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бал WnCm 1 = ミワ 
where 
Wm =1+0.5p sin(zm/p) 0 > #8 > ይ + 1 
=0 р<т 


То improve the representation of the speech spectrum, the 
analysis is extended to include information about the tem- 
poral cepstral derivative, which introduces temporal order 
into the representation (316). The so-called delta cepstral 
coefficients are an approximation to the time derivatives of 
the cepstral coefficients. They are given by the equation 


3 K 
Ac の = 5 > kë, (+ k) 
k--K 


where К-2 and 1-2, the time index (frame number) that 
denotes the central windowed frame in a state. The zeroth 
through eleventh coefficients of the complete cepstral vector 
c comprise the central frame (1-2) ¢,, coefficients for 
0 ,مك ممع‎ where p=11. 

The 12th through 23rd coefficients of c are c」。 =Ac, (2) 
for 0515511. As a result, there is one c vector (denoted the 
“cepstral vector” for convenience) for each state. The vector 
may be expressed as 


m 


с-(бобы6»... Gi AC AC, . . . Aci) 


Where the arguments for the Ac terms have been omitted 
because it is assumed that 1=2. 

The final step in the computation of the cepstral vectors is 
energy normalization (318). The zeroth component is 
replaced by the definition 


eo={c[0]-EN,, سي‎ 
where ርዕ is the normalized energy of the state. 
c[0]-[max10 log; 7(0),0] 


EN -max(c[0]) for all states within a phrase. 


As a result, for р-11, a 24-coefficient vector 


€-(65,6,,65 ㆍ . . 1 Асо Ас... Ас) 


characterizes each state. A total of 240 such cepstral vectors 
characterize the eight states in 30 phrases for each reference 
subject. 

To complete the characterization of all the reference 
subjects in a psychologically homogeneous group, the CPU 
sorts the vectors representing each state into a set of three 
clusters 400 for each state, as shown in a simplified two- 
dimensional representation in FIG. 4. Clusterization can be 
performed using the K-means algorithm described in 
Rabiner, e.g., 5 3.4.4. Note that each reference subject is 
characterized by 30 vectors per state, one from each of the 
30 phrases uttered by each reference subject. Accordingly, 
30xR vectors are sorted into clusters for each state, where R 
is the number of reference subjects in a psychologically 
homogeneous group. 

In one embodiment, the present invention may employ the 
K-means algorithm described in Rabiner or a variation 
thereof. According to this variation, the algorithm first 
computes a matrix of distances between each cepstral vector 
and all other cepstral vectors representing a particular state. 
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The distance is the usual Euclidean distance in 24 
dimensions, except that the square of the difference of the 
zeroth component (related to energy of the state) is weighted 
by multiplying it by 3 instead of unity as for the other 
components. The distance matrix is used to compute the 
maximum distance between vectors, DMAX, and the mean 
distance between vectors, DMEAN. A quantity MAXDIST 
is calculated as min (1.4 DMEAN, 0.8 DMAX). 

Next, the algorithm sorts into one cluster those vectors 
which are a distance of at least MAXDIST from all other 
vectors. The remaining vectors form a second cluster, the 
centroid of which is determined. Next, the larger cluster, i.e., 
the one having the maximum average intra-cluster distance, 
or variance in 24 dimensions, is determined. This may be the 
first cluster formed in the first step. The larger cluster is then 
divided into two clusters. This is accomplished by finding 
the two vectors in it that are farthest from each other, and 
choosing them as cluster centers. All the vectors that are not 
one of the three cluster centers are then assigned to the 
nearest neighbor cluster center, 1.6., the cluster center to 
which an individual vector is closest. This process results in 
three clusters 400 per state. 

The three cluster centroids are then recalculated. The 
distances of all the vectors in all three clusters are computed 
from each newly-calculated center. The vectors are then 
redistributed among the clusters so that each vector is closest 
to its nearest-neighbor cluster center. The centroids for these 
newly formed clusters are then calculated, and the redistri- 
bution process is continued until no vector is reassigned 
from one cluster to another. The result is three clusters 400 
for each of the eight states within a psychologically homo- 
geneous group (speech model 402) stored in the knowledge 
base. These clusters form the knowledge base. Cluster 
statistics are collected for use in the comparison with the 
speech parameters of a test subject. The following statistics 
are collected for each state within a psychologically homo- 
geneous group: 

cluster centers (3) 

dispersion (3) 

meansegen 

maxsegen 

minsegen 

meanseglen 

maxseglen 

minseglen 

cluster component weights (3) 

mean vector 

transition matrix 

The cluster centers are the centroids of the three clusters 
representing the psychologically homogeneous group. The 
dispersion is the mean square dispersion about each center 
in each of the 24 dimensions. In addition, the mean, mini- 
mum and maximum energies (meansegen, minsegen, 
maxsegen) for each state represent the mean, minimum and 
maximum energy statistics, respectively, of each state over 
all 30 phrases for all reference subjects. The energy of each 
individual state is derived from the zeroth component of its 
corresponding cepstral vector. The weight of a cluster rep- 
resents the fraction of vectors within that cluster. The mean 
vector is the average of all cepstral vectors for a given state 
within a homogeneous group. 

Тһе invention later compares the cluster statistics in the 
knowledge base with the speech parameters of a test subject 
(204). Those skilled in the art will recognize that a wide 
variety of speech pattern comparison techniques may be 
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employed for this purpose. ል number of these techniques are 
described іп Rabiner. In one embodiment, the present іпуеп- 
tion uses a hidden Markov model to characterize speech, as 
discussed in Rabiner, Chapter 6 (already incorporated by 
reference herein), and C. H. Lee, L. R. Rabiner, “Frame- 
Synchronous Network Search Algorithm for Connected 
Word Recognition,” IEEE Transactions on Acoustics, 
Speech, and Signal Processing Vol. 37, No. 11, November 
1989 (“Lee”), which is also incorporated by reference 
herein. Under that model, the invention first optimizes the 
knowledge base using the Viterbi algorithm. Then, during 
pattern comparison the invention again employs the Viterbi 
algorithm to determine the similarity of the test subject’s 
speech parameters to those in the knowledge base. The 
calculations of the Viterbi similarity values are very well 
known in the art and widely described in the literature. In 
one embodiment, the present invention employs the modi- 
fied Viterbi algorithm described in Lee. 

The transition matrix is used in the pattern comparison 
process as part of the Viterbi algorithm. The transition 
matrix is stored in the knowledge base and later modified by 
the Viterbi algorithm. To create the initial transition matrix, 
an initial state duration (seglen) for each of the eight states 
is computed according to the following pseudo code. 

Compute the mean energy (E,,,..,,,) over all the states, i.e., 
add the meansegen for all 8 states within a group in the 
knowledge base and divide by 8. 

1. ACC=0 (energy accumulator=0) 

. old.=0 

.1-0 

k=0 

. ACC=ACC+meansegen (i) 
. if (ACCZE,, a.) then 
ACC=0 

. seglen(k)=i-old,-1 
old,=i-1 

. 1= ュ ー1 

. k=k+1 

. if (k>7) go to 19 

. endif 

. 15141 

. if (i>7) go to 18 

. continue 

. go to 5 

18. if (k<8) seglen (k)=i-old, 

19. end 

This algorithm produces a set of values for the state 
durations seglen (k) for the states К-0,1,...,7. Those skilled 
in the art will recognize that other well-known techniques 
may be substituted to optimize the state durations. 

The next step in the construction of the knowledge base 
for later use in a Viterbi pattern comparison is the compu- 
tation of an initial transition matrix. The transition matrix 
characterizes a first-order Markov process. The matrix com- 
prises all zero elements except for the diagonal and super- 
diagonal elements. The diagonal elements are 

Ay ,7In (գյ), where 


ら の いひ の ょ よら いい 


== 
= ° ` 


テー ビビ ビビ ロビ 
іс ኒዳ + Թթ 


1 


CEES 1 + seglen(k)' 


апа seglen(k) is the length of the kth state. The superdiago- 
nal elements are given by A, ,, ,-In (а,,1), where Qg g= 
1-04, for ke0,1, . . . 7. 
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This initial transition matrix is optimized using the Viterbi 
algorithm. The Viterbi algorithm generates a similarity mea- 
sure or distance that is proportional to the logarithm of the 
probability of similarity of a vector to the speech model (for 
a particular homogeneous group) stored in the knowledge 
base. The probability of being in the most likely one of the 
three clusters (1.6., the closest cluster) for each state is noted 
and the product of these probabilities for all eight states in 
а phrase is kept as the chance that that phrase fits the model 
for a particular homogeneous group. This process is repeated 
for all 30 phrases to arrive at a total probability that the 
30-phrase utterance belongs to a particular homogeneous 
group in the knowledge base. The total probability for all 30 
phrases is the product of the probabilities for each phrase. 

The Viterbi algorithm is employed to optimize the knowl- 
edge base by comparing all 30 phrases for each reference 
subject with the homogenous group in the knowledge base 
to which the reference subject belongs (ie., the speech 
model for that group). The Viterbi distance between each 
reference subject's cepstral vectors and the closest cluster 
within a three-cluster set is recorded for each state in the 
reference subject's homogeneous group in the knowledge 
base. The Viterbi distance for each phrase is then calculated, 
as described above. The Viterbi algorithm is then iterated to 
obtain the optimum state duration for the comparison of a 
phrase of the reference subject's speech to the speech model 
of the homogeneous group to which the reference subject 
belongs. The optimum state duration produced at every step 
is averaged over the phrases and the iterations with the 
variable mean seglen (initially seglen) to produce a new 
mean seglen value. The mean seglen value is substituted for 
seglen in the calculation of the diagonal and super-diagonal 
elements of the transition matrix. The iteration process is 
continued for approximately 3 to 7 iterations. The most 
likely model, i.e., the model resulting in the highest total 
probability for all 30 phrases is retained in case the quality 
deteriorates after more iterations. This process is described 
in the Lee paper, incorporated by reference herein. At the 
optimum state duration, the Viterbi distance between the 30 
phrases and the model for that homogeneous group is 
minimized. The result is a transition matrix that is used later 
in the pattern comparison process. 

Pattern Comparison 

Тһе speech parameters of a test subject are compared to 
the cluster statistics for each psychologically homogeneous 
group in order to determine which groups correlate most 
highly to the test subject. The test subject may be instructed 
to speak the same words as the reference subjects. Like the 
speech of a reference subject, the test subject's speech is 
digitized by a sound card. The CPU divides the test subject's 
speech into 30 phrases, and divides each phrase into eight 
states. The 30 phrases are parameterized into 240 cepstral 
vectors. Unlike the vectors generated for the reference 
subjects, the test subject's vectors are not clustered. 

Тһе thirty-phrase utterance for the test subject is com- 
pared to each homogeneous group in the knowledge base. 
This comparison is made phrase by phrase and for each 
state. The distance between the test subject's state cepstral 
vectors and the closest cluster within a three-cluster state set 
used as a state distance measure in the Viterbi algorithm. The 
Viterbi algorithm is iterated to adjust the state durations, in 
a similar manner as that described above, in order to 
minimize the Viterbi probability or distance between the test 
subject's vectors representing a phrase and a speech model 
for a homogeneous group (i.e., the eight three-cluster sets 
representing the group). The total probability of an utterance 
matching the model is calculated by multiplying the prob- 
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abilities of all 30 phrases. The psychological characteristics 
associated with the speech models that register the highest of 
these optimized probabilities (either on phrase basis or, 
alternatively, the total probability of a 30-phrase utterance) 
are deemed to be the characteristics representing the psy- 
chological makeup of the test subject. 

Those skilled in the art will recognize that a wide variety 
of speech characterization and comparison techniques can 
easily be employed to practice the present invention, and 
thus, the present invention is not limited to the exemplary 
techniques described herein. 

Pattern Comparison with Myers-Briggs Knowledge Base 

The above discussion generally describes how the speech 
of a test subject may be correlated with psychologically 
homogeneous groups in the knowledge base. In particular, 
the knowledge base may be broken down into groups 
corresponding to the 16 Jungian character types generated 
by the well-known Myers-Briggs Personality Assessment. 
For an explanation of this assessment and how it is 
administered, please refer to I. B. Myers, Manual: A Guide 
to the Development and Use of the Myers-Briggs Type 
Indicator, Consulting Psychological Press, Inc., Palo Alto, 
Calif, 1985, which is incorporated by reference herein. 
These 16 types, numbered for convenience, are as follows: 


. ENTP 
ISFP 
ESFJ 
INTJ 
ЕМЕР 
ISTP 
ESTJ 
. INFJ 
. ESFP 
. INTP 
. ENTJ 
. ISEJ 
. ESTP 
. INFP 
. ENFJ 
.ISTJ 


سروح هي حر م بج ذم مه ص 


То form the knowledge base of Myers-Briggs types, the 
superset of reference subjects is assessed using the Myers- 
Briggs test. According to the test results, the superset is 
broken down into psychologically homogeneous groups of 
individuals corresponding to the 16 Jungian character types. 
Then, as described above, the speech parameters of these 
reference subjects are collected, clustered апа Viterbi- 
optimized in order to provide a speech representation for 
each character type. 

To perform the pattern comparison, 30 phrases of eight 
states each are collected from the test subject, as before. 
These 30 phrases are converted into 240 cepstral vectors. As 
before, the eight state cepstral vectors corresponding to the 
first phrase are compared using the Viterbi algorithm with 
the three-cluster sets representing each state for the first 
Jungian character type. The first phrase is similarly com- 
pared to the other 15 character types. This process is 
repeated for the 2nd through 30th phrases. The result is 
30x16=480 Viterbi similarity values. This data is reduced by 
assigning to each phrase only the character type that resulted 
in the highest similarity value for the phrase. This results in 
30 types corresponding to the 30 phrases. Invariably 
(because there are fewer types than phrases), some types will 
show up as corresponding to more than one phrase. 
Accordingly, the frequency of occurrence of each type is 
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divided by 30 to yield the proportion of the total personality 
space for the test subject. Only types that account for more 
than 4% (16., occur more than once) are retained by the 
program. The CPU then causes to be displayed these char- 
acter types along with the corresponding percentage of the 
test subject’s personality space. In this manner, the assess- 
ment system of the invention recognizes that each individual 
may comprise a combination of personality types that are 
present in differing degrees. 

In another embodiment, four scales can be created for the 
Myers-Briggs Jungian character types. In this scheme, there 
are four sets of opposite character constructs, E-I (extrovert- 
introvert), S-N (sensoric-intuitive), T-F (thoughtful-feeling), 
and J-P (decision maker-plagued). For the 30 types that 
correlate most highly to the 30 phrases, the number of 
phrases that exhibit the first construct in the corresponding 
type is subtracted from the number of phrases that exhibit 
the second construct. For example, for the first scale, the 
number of phrases that have E’s in their corresponding type 
is subtracted from the number of phrases which have I’s in 
their corresponding types. This difference is multiplied by a 
factor and a constant is added to create a range that runs from 
0 to 100, or whatever range is most convenient for a raw 
score. For example, for 30 phrases the possible differences 
run from minus 30 to plus 30. Therefore, multiply by 55 and 
add 50 to obtain a range from 0 to 100. This method may be 
extended to compute other scales related to different tests. 
Pattern Comparison Using the Luscher Color Knowledge 
Base A knowledge base may be formed using the well- 
known Luscher Color test. The test is based upon the order 
of preference that the reference subjects have for the eight L 
uscher colors: gray, blue, green, red, yellow, violet, brown or 
black. For an explanation of the Luscher test and how it is 
administered, please refer to M. Luscher (translated by I. 
Scott), The Liischer Color Test, Washington Square Press 
1969, which is incorporated by reference herein. The L 
uscher test is administered to the superset of reference 
subjects, which is divided into eight homogeneous groups 
corresponding to the eight Luscher Colors. The speech 
parameters of these groups are generated and stored in the 
knowledge base using the techniques described above. As an 
example, most of the knowledge base statistics for the L 
uscher test are illustrated in FIGS. 5а-5х. Note that the 
transition matrix is not in logarithmic form, but in a,, and 
4, 4,4 form. 

To perform the pattern comparison, each phrase of the test 
subject’s speech is compared to each of the eight Luscher 
groups in the knowledge base. For each phrase, the Viterbi 
similarity values corresponding to the eight colors are 
ranked in order from highest degree of comparison to 
smallest. These ranked colors are then sorted into five color 
couples according to the Luscher technique. This procedure 
is repeated for the second through thirtieth phrases, so that 
there are five color couples for each phrase. Note that the 
first four couples are formed by pairing the colors in the 
order in which they occur. The fifth couple comprises the 
first color paired with the last. For example, if the Luscher 
sequence in order of preference is blue, red, gray, yellow, 
green, violet, black, brown, then the Luscher couples would 
be (+blue «red, xgray xyellow, =green -violet, -black 
-brown, «blue -brown). 

Тһе number of times a color couple appears in the first 
position is divided by 30 to yield the proportion that the 
color couple appears in the first position. This process is 
repeated for the second, third, fourth and fifth couple posi- 
tions. Only color couples that appear in a particular position 
more than 446 of the time are retained by the program. For 
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each color couple position, էհօ system displays а descriptive 
paragraph concerning the psychological characteristics asso- 
ciated with the selected color couples, along with the per- 
centage of occurrence that the couple appears in a particular 
position. One example of such descriptive paragraphs is 
found in the Luscher book. These paragraphs may be 
modified, particularly by directing one set of descriptive 
paragraphs to lay people and another set to psychology 
professionals, without deviating from the basic meaning of 
the original Luscher descriptive paragraphs. 
Pattern Comparison Using Myers-Briggs Enhanced with 
Luscher Knowledge Base 

In another embodiment, the pattern comparison with the 
Myers-Briggs knowledge base is enhanced with information 
from the Luscher knowledge base. In addition to the 16 
Myers-Briggs homogeneous groups, this knowledge base 
also includes 8 subgroups corresponding to each Myers- 
Briggs group. The Luscher color test is administered to each 
homogeneous group representing a Myers-Briggs personal- 
ity type. Each group is divided into 8 subgroups, where each 
subgroup corresponds to the favorite color (of the eight) 
chosen by the reference subjects within the Myers-Briggs 
group. For example, the first Myers-Briggs type is ENTP. 
The reference subjects that primarily manifest this type form 
a homogeneous group whose speech parameters are stored 
in the knowledge base. This group is then administered the 
Luscher test to determine the favorite colors of the members 
of the group. The group is then broken down into 8 sub- 
groups based upon favorite color preference. These sub- 
groups are: ENTP-gray, ENTP-blue, ENTP-green, ENTP- 
red, ENTP-yellow, ENTP-violet, ENTP-brown, and ENTP- 
black. Accordingly the knowledge base now comprises 
16x8=128 subgroups in addition to the original 16 Myers- 
Briggs groups for a total of 144 speech models correspond- 
ing to homogeneous groups. 

This enhanced knowledge base is used by first conducting 
a pattern comparison with the 16 Myers-Briggs speech 
models in the knowledge base, as before. This yields 30 
highest-probability Jungian types for the 30 phrases in the 
test subject’s utterance. Each phrase is then compared with 
the 8 speech model subgroups corresponding to the highest 
probability type for the phrase. This results in 8 Viterbi 
similarity values for each phrase. The 8 colors for the phrase 
are then ranked in order from highest degree of comparison 
to smallest. These ranked colors are then sorted into 5 color 
couples according to the Luscher technique described above. 

The number of times a color couple appears in the first 
position is divided by 30 to yield the proportion in percent- 
age that a color couple appears in the first position. This 
process is repeated for the second, third, fourth and fifth 
couple positions. As before, only those color couples that 
appear in a particular position greater than 4% of the time 
are selected. For each of these couples, a descriptive para- 
graph concerning the psychological characteristics associ- 
ated with the color couples displayed, along with the per- 
centage occurrence of that couple in that position. 
Pattern Comparison Using MMPI Knowledge Base 

In yet another embodiment, the knowledge base may be 
formed using the Minnesota Multiphasic Personality Inven- 
tory (MMPI). For an explanation of the MMPI and how it is 
administered, please refer to J. N. Butcher, W. G. Dahlstrom, 
J. R. Graham, A. Tellegen, В. Kraemmer, Minnesota Mul- 
tiphasic Personality Inventory (MMP1-2) Manual for 
Administration and Scoring, University of Minnesota Press, 
Minneapolis, 1989, R. L. Greene, The MMPI-2/MMPI-1: An 
Interpretive Manual, Allyn and Bacon 1991; and J. R. 
Graham, The MMPI-2 Assessing Personality and 
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Psychopathology, Oxford University Press, 1990; all of 
which are incorporated by reference herein. 

The Minnesota Multiphasic Personality Inventory- 
Second Edition (MMPI-2) is a 567-item paper-and-pencil 
self-report inventory that utilizes the true-false response 
format. The MMPI is currently the most widely used and 
researched objective personality inventory. The MMPI pro- 
vides an objective means of assessing abnormal behavior. 
The MMPI categorizes the psychological makeup of an 
individual into ten scales or criterion groups, as follows: 


1. Hypochondriasis 

2. Depression 

3. Hysteria 

4. Psychopathic Deviate 
5. Masculinity-Femininity 
6. Paranoia 

7. Psychasthenia 

8. Schizophrenia 

9. Hypomania 

10. Social Introversion 


In addition, four validity scales measure the individual’s 
test-taking attitude. 

The MMPI-2 clinical scales are scaled to the familiar 
T-score metric having a mean of 50 and standard deviation 
of 10. These T-scores are based on the responses of approxi- 
mately 2,600 subjects (1,138 males and 1,462 females). A 
T-score indicates how many standard deviation units above 
or below the mean an individual’s score lies in a distribution 
of scores. A T-score of 50 for any particular scale indicates 
that a subject’s score is equal to the mean score for the 
standardization sample. Generally, T-scores that are greater 
than or equal to two standard deviations above the mean, i.e., 
a score above 70, or less than or equal to one standard 
deviation below the mean, i.e., below 40, are deemed worthy 
of clinical interpretation. The MMPI scales represent a 
continuum corresponding to the degree to which a particular 
criterion, e.g., depression, is expressed in an individual 
subject. Accordingly, unlike the Myers-Briggs or Luscher 
categories, the MMPI criterion groups cannot be simply 
assigned to psychologically homogenous groups in the 
knowledge base. Rather, the groups in the knowledge base 
are formed only from those reference subjects who manifest 
a high degree of expression of the psychological construct 
associated with each MMPI scale. The scale scores range 
from 20 to 115, where 115 corresponds to a high degree of 
expression. A reference subject is selected for placement in 
a psychologically homogeneous group if the subject scores 
above 70 points on the scale for a particular criterion group 
while scoring below 60 points on all other scales. For 
example, a subject is classified as depressed if the subject 
scores above 70 on the depression scale, while scoring 
below 60 on all the other scales. Alternatively, reference 
subjects may be classified according to two-point MMPI 
code types described in Greene and in Graham. 

The MMPI knowledge base is employed in the pattern 
comparison in much the same way as the Myers-Briggs 
knowledge base. That is, 30 phrases of eight states each are 
collected from the test subject. These 30 phrases are con- 
verted into 240 cepstral vectors. The eight cepstral vectors 
corresponding to the first phrase are compared using the 
Viterbi algorithm with the three-cluster sets representing 
each state for the first MMPI criterion group. The first phrase 
is similarly compared to the other nine criterion groups. This 
process is repeated for the second through thirtieth phrases. 
The result is 30x10=300 Viterbi similarity values. This data 
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is reduced by assigning only the criterion group that resulted 
in the highest similarity value for each phrase. This results 
in 30 criterion groups corresponding to the 30 phrases. As 
with the Myers-Briggs knowledge base, the frequency of 
occurrence of each criterion group is divided by 30 to yield 
the percentage of the total personality space for the test 
subject. Any criterion group that accounts for less than 3% 
is ignored by the program. The CPU then displays the 
remaining criterion groups along with the corresponding 
percentage of the test subject’s personality space. 

Those skilled in the art will recognize that the present 
invention may similarly be applied to other psychological 
assessment scales, such as the Millon Clinical Multiaxial 
Inventory-3rd Edition (MCMI-II). The MCMI-II is a 175- 
item paper-and-pencil self-report inventory that also utilizes 
a true-false response format. The test comprises 14 person- 
ality scales. The 14 scales provide a statistically significant 
differentiation of subjects on the basis of the DSM-III and 
DSM-III-R nosology of personality disorders. The 14 scales 
are named: Schizoid, Avoidant, Depressive, Dependent, 
Histrionic, Narcissistic, Antisocial, Aggressive (Sadistic), 
Compulsive, Passive-Aggressive (Negativistic), Self- 
Defeating, Schizotypal, Borderline, and Paranoid. The 
scales are scaled to a T-score metric. However, the T-scores 
are adjusted so that a score of 85 corresponds to actual 
prevalence rate of the trait measured, a score of 60 corre- 
sponds to the median raw score, and a score of 115 corre- 
sponds to the maximum attained raw score. In general, 
scores between 75 to 84 indicate the presence of the mea- 
sured disorder, whereas scores greater than 84 indicate the 
prominence of the measured disorder. Based upon these 
statistics, present invention may employ the MCMI-III in a 
manner similar to use of MMPI by assigning reference 
subjects to a psychologically homogeneous group in the 
knowledge base if they score above 84 on the scale corre- 
sponding to the psychologically homogeneous group while 
scoring less than 75 on the other scales. 

Alternative Scaling Method 

In yet another embodiment, a group of reference subjects 
may be tested on a personality inventory, and then trichoto- 
mized on the basis of their scores on the inventory using 
standard test construction techniques. The three groups form 
psychologically homogeneous groups for the inventory 
scale. Speech parameters are collected from these groups to 
form three speech models in the knowledge base. 

For example, the subjects may be tested on a depression 
inventory or scale. The highest scorers (most depressed) 
may be sorted into group Number 3, the middle or average 
scorers into group Number 2, and the lowest scorers into 
group Number 1, forming three corresponding speech mod- 
els in the knowledge base. Next, the similarity between the 
speech characteristics of each of a test subject’s phrases, 
phrases 1-30, and the speech models for the extremes of the 
depression scale groups in the knowledge base, Number 1 
and Number 3, are computed. Each phrase is classified as 
belonging to group Number 1 or group Number 3 within the 
depression inventory (scale) according to which speech 
model is closest as measured by the Viterbi algorithm. A 
total depression score is then obtained as the difference 
between the number of group Number 3 phrases and the 
number of group Number 1 phrases within the 30 phrase 
utterance. This score may be displayed by the system. 

Aweighted score may be obtained by adding up the group 
numbers (for group Numbers 1, 2 and 3) for each of the 30 
phrases. This technique gives a greater weight, 1.6., 3, to the 
phrases corresponding to the most depressed group in the 
knowledge base. Following the convention of adjusting 
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psychological scales according to their dispersion about 
their means, the mean and standard deviation of the depres- 
sion scale can be computed and used to transform the 
obtained depression scores (or raw scores) to standardized 
scores. Additionally the depression scale distribution may be 
normalized or smoothed to conform to standard psychologi- 
cal practice. This method can be extended to tests with 
multiple scales by applying the above described procedure 
scale by scale. In this way, these measures can be used to 
analyze the vocal utterance to imitate a wide variety of 
scale-based tests. 

Measuring the Degree of Sincerity 

The present invention may be employed to measure the 
degree of sincerity of a test subject, where the extremes of 
the sincerity continuum represent falsehood and truth. In one 
embodiment, the knowledge base may be formed of two 
psychologically homogeneous groups—tiiars and truth tell- 
ers. Using one technique, the reference subjects are psycho- 
logically stressed by instructing them to make true and false 
statements about personally catastrophic events, such as a 
death in the family. The groups may actually comprise the 
same people, where the liars’ group in the knowledge base 
contains speech parameters from those people speaking lies 
and the truthful group in the knowledge base contains 
speech parameters of those people making true statements. 

Alternatively, the reference subjects are instructed that 
they are participating in an experiment to determine the 
accuracy of a lie detector. The reference subjects are ran- 
domly partitioned into two groups. One group is instructed 
to tell the truth, and the other group is instructed to lie. The 
group that is instructed to lie is offered a reward if they are 
able to deceive the lie detector successfully. The inducement 
of a reward serves to effect the heightened anxiety that may 
be experienced by individuals that lie to obtain some sec- 
ondary gain, e.g., escape from punishment, attainment of a 
job). The respective speech parameters of the liars and the 
truthtellers are entered into the knowledge base. 

As with other tests, 30 phrases of eight states each are 
collected from the test subject to perform the pattern com- 
parison. These 30 phrases are converted into 240 cepstral 
vectors. The eight state cepstral vectors corresponding to the 
first phrase are compared using the Viterbi algorithm with 
each three-cluster set representing each state for the truthful 
group. The first phrase is similarly compared to the liars’ 
group in the knowledge base. This process is repeated for the 
second through thirtieth phrases. The result is 30x2=60 
Viterbi similarity values. This data is reduced by assigning 
to each phrase only the group that resulted in the highest 
similarity value for each phrase. This results in 30 groups 
(true or false) corresponding to the 30 phrases. The fre- 
quency of occurrence of each group is divided by 30 to yield 
a percentage measure of the truthfulness of the test subject’s 
utterance. The percentage scores for each group may be 
normalized to conform to standard psychological practice. If 
the percentile rank assigned to truthfulness is greater than 
the 84th percentile (one standard deviation), then the thirty- 
phrase utterance is deemed as being truthful. Conversely, if 
the percentage of falsity is greater than the 84th percentile, 
then the utterance is deemed to be false. If the 84 percentile 
threshold is not met for either falsity or truthfulness, then the 
veracity of the utterance is deemed to be questionable. 
Alternatively, a 98 percentile rank (two standard deviation) 
threshold may be employed to achieve a greater degree of 
certainty. One or two standard deviations are conventional 
statistical thresholds in the physical and social sciences, of 
course, other thresholds may be employed if warranted by 
other psychological testing methods. 
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Unlike the other tests described аһоуе, the measure of 
sincerity is time-dependent on the truth or falsity of the 
utterance being made by the test subject. Accordingly, 
sincerity is displayed as a function of time, as shown in FIG. 
6. The figure illustrates a sonogram in which the sonogram 
trace is colored red for those utterances which are deemed 
false, and colored blue for those utterances deemed truthful. 
Utterances of questionable veracity are displayed in a violet 
color on the sonogram. These colors will vary over time with 
the truthfulness of the statement made by the subject. 

Other time-dependent psychological characteristics may 
also be displayed in this manner. For example, a psycho- 
logically homogeneous group of reference subjects uttering 
humorous statements may be formed, along with a group 
making serious statements. A pattern comparison similar to 
that used for truth and falsity may be employed. In this case, 
humor may be displayed with a green color on the sono- 
gram. Those skilled in the art will recognize that this color 
sonogram display technique may be employed to display 
any psychological, physiological or other characteristics of 
the speaker. In particular, the color display for any of these 
characteristics may vary with time according to the charac- 
teristic measured at a particular time as the subject speaks. 

In another embodiment, the invention indicates time- 
dependent psychological characteristics using the SOCION 
theory of inter-typology cooperation developed in the 
former Soviet Union by A. Augustinavichute, R. 
Bogdashevsky, and V. Alexeev. The SOCION theory is 
described in A. Augustinavichute, Inter-Type Relations Fur- 
ther to the “A” Module Description, Latvia 1980 and E. 
Filatov, “SOCIONICA For You,” Siberian Chronograph, 
Novosiborsk City 1993 (ISBN 5-87550-010-7), which are 
incorporated by reference herein. The SOCION matrix is a 
representation of the degree to which individuals classified 
by 16 SOCION types will cooperate and work productively 
with one another. The 16 SOCION types can be considered 
modified Myers-Briggs types, and are, in fact, the result of 
modifications by Augustinavichute, et al. to the Myers- 
Briggs assessment. 

The SOCION matrix has rows 1-16, where the ith row 
represents an individual who is predominantly of the ith 
SOCION character type. The matrix also has columns 1-16, 
where the jth column represents individuals who are pre- 
dominantly of the jth character type. (A person is classified 
as predominantly of one type if matched to that type more 
than all other types.) Each row/column intersection ij indi- 
cates the relationship between an individual of the ith type 
and an individual of the jth type based upon the SOCION 
theory of inter-typology cooperation. The SOCION matrix is 
illustrated in FIG. 7. Each intersection ij is filled with a 
symbol indicating the predicted nature of an interpersonal 
relationship between a person of the ith type and a person of 
the jth type, and in particular, the likelihood that a person of 
the ith type would cooperate in a complementary and 
productive fashion with a person of the jth type. 

The present invention adapts the normative (inter- 
individual) approach of the SOCION matrix for an ipsative 
(intra-individual) purpose. Applying group data to interpre- 
tation of an individual in this manner is rooted in the 
application of the well-known principles of inferential sta- 
tistics and “true score” theory. 

The present invention employs the SOCION matrix to 
measure the degree of sincerity as follows. The matrix is 
stored in a lookup table in memory 106. A knowledge base 
is formed based upon the 16 SOCION types in much the 
same way it is formed for the Myers-Briggs assessment. In 
other words, a statistically large group of reference subjects 
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are assessed under the SOCION theory, and thereby divided 
into 16 SOCION types. As a test subject speaks, each phrase 
is divided into 8 states. Thirty phrases are not required. One 
cepstral vector is calculated for each state. Using the Viterbi 
algorithm, each eight-state phrase is compared to each of the 
16 SOCION speech models. For each phrase, the two speech 
models that correlate most highly with the phrase (i.e., the 
two highest ranked models) are retained. The two SOCION 
types that correspond to these speech models are used as row 
and column indexes of the SOCION matrix. For each 
phrase, the intersection of these two indexes is retained. 

According to the SOCION theory, if the intersection of 
the indexed row and the indexed column indicate that the 
two typologies are in conflict, this indicates stress in the test 
subject as the test subject speaks the phrase. Referring to 
FIG. 7, if the intersection of the two typologies in the matrix 
is represented by a “D,” then the two typologies are in 
conflict and indicate that it is likely that the test subject is 
lying while speaking the phrase under test. In the sonogram, 
the portion corresponding to this phrase is colored red to 
indicate a lie. 

If the intersection of the two typologies contains the 
symbol “hs,” then this casts some doubt on the truthfulness 
of the phrase. In the sonogram, the phrase would then be 
colored violet. 

If the intersection of the typologies contains the symbol 
“R,” then this indicates that the test subject is speaking the 
phrase in a humorous manner. This state of mind is repre- 
sented by green on the sonogram portion that indicates that 
the phrase is being spoken. All other symbols indicate no 
conflict within the individual test subject, and are indicated 
by a blue color on the sonogram. 

The matrix relating the degree of sincerity to SOCION 
types, Jungian types or other psychological measures may 
be formed as follows. First, groups of liars and truthtellers 
are formed as described above. For the example of the 
Myers-Briggs assessment, the matrix may be formed by 
identifying through actuarial analysis the first and second 
ranked Myers-Briggs types that are displayed most consis- 
tently and frequently in the voice of liars than in the voice 
of truthtellers. The presence of these two types in the voice 
of a test subject serves as a marker for false statements. 
Measuring Degree of Cooperation 

The present invention may also be employed in conjunc- 
tion with the SOCION matrix to determine the degree of 
cooperation between individuals. First, one individual 
speaks into the system of the invention. In a manner similar 
to that described above with respect to the Myers-Briggs 
assessment, the system generates a SOCION assessment of 
the individual. Second, another individual speaks into the 
invention, providing another SOCION assessment. The 
highest ranking SOCION types from the two individuals are 
used as row and column indexes of the SOCION matrix. The 
degree of cooperation between the individuals is determined 
by the system at the intersection between the first and second 
indexes. This process is performed by the speech processing 
software in the same manner as if the individuals had taken 
pencil-and-paper SOCION assessments and their resulting 
character types used to index the matrix. 

Physiological Testing 

The present invention may also be employed for physi- 
ological testing. In this case, the psychologically homoge- 
neous groups in the knowledge base described above are 
replaced by physiologically homogeneous groups. For 
example, a group of patients with heart problems form one 
physiologically homogeneous group, whereas a group of 
healthy subjects form another physiologically homogeneous 
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group. Thirty phrases of a test subject’s speech аге recorded 
and analyzed to determine the probability that the test 
subject falls within either category. The frequency of occur- 
rence of each group is divided by thirty to yield a percentage 
measured for each group. The percentage scores for each 
group may be normalized to conform to standard practice. If 
the percentile rank associated with either group is greater 
than the 84 percentile (one standard deviation), or alterna- 
tively the 98 percentile (two standard deviations), the sub- 
ject is deemed to belong to that group. Otherwise, the test is 
deemed inconclusive. Again, one or two standard deviations 
are conventional statistical thresholds in test construction of 
course, other thresholds may apply based upon the condition 
studied. 

Determining Efficacy of Medication 

Based on the foregoing, the present invention can detect 
the presence of psychological or physiological disorders. 
Conversely, the invention, of course, can detect the absence 
of such disorders. Accordingly, a test subject having a 
disorder as indicated by the present invention may be 
prescribed a given medication to treat the disorder. After 
treatment, the present invention may be employed to assess 
the test subject for the treated psychological or physiological 
disorder. If the invention determines that the disorder has 
been mitigated, then this mitigation may have been due to 
the drug or other treatment [don’t limit to medication]. For 
example, a test subject indicated as suffering from severe 
depression through comparison to the MMPI knowledge 
base may be treated with an antidepressant medication or 
psychotherapy. After a round of treatment, mitigation of the 
depression may be measured by the invention. Large groups 
of test subjects may be assessed in this manner to determine 
the efficacy of a medication or other treatment. Therefore, 
the present invention may be employed both to conduct 
statistical trials of a treatment, and to determine the effec- 
tiveness of a treatment on an individual test subject. 

The present invention has additional applications in any 
field where psychological or physiological testing is cur- 
rently used. Moreover, because the present invention can 
perform these assessments in a relatively short period of 
time, based on a short speech sample, it can reduce the 
expense and effort to conduct such tests. Further, the inven- 
tion allows these assessments to be employed in applications 
for which conventional testing would be subject to unac- 
ceptable time and money constraints. Such applications 
include, without limitation, rapid airline passenger security 
screening, rapid psychological screening in a managed 
health care environment, and monitoring of compliance and 
motivation of substance abusers under treatment. 

An important aspect of the present invention is that it can 
be easily trained to associate speech parameters with psy- 
chological or physiological characteristics regardless of the 
(non-speech based) assessment employed to quantify those 
characteristics. The system operator need only administer 
the assessment, e.g., Myers Briggs, to a statistically signifi- 
cant group of reference subjects, and record speech samples 
from each homogeneous group determined by the assess- 
ment. Determination of the number of subjects necessary to 
achieve statistical significance is known in the art, and is 
described in L. M. Crocker and V. Alqina, Introduction to 
Classical and Modern Test Theory, New York: Holt, Rine- 
hart and Winston, 1986, which is incorporated by reference 
herein. Based upon this empirical data, the speech-based 
system of the invention then creates a knowledge base 
representing the desired assessment in the *speech domain." 
In this manner, the system is easily trainable to administer 
any test using a rapid characterization of a test subject's 
speech. 
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Further, the invention does not relate to а particular 
psychological or physiological theory about what specific 
speech characteristics distinguish one homogeneous group 
from another. Moreover, it does not require any a priori 
knowledge of speech, although it may be adapted to take 
such information into account. Rather, as described above, it 
is based upon an empirical analysis of speech using a broad 
speech model. In one embodiment, speech is characterized 
with an LPC model based upon a time-ordered series of 
frequency characteristics, e.g., eight cepstral vectors per 
phrase. This time/frequency representation provides a 
description of speech that is much broader than (and inde- 
pendent of a priori knowledge of) the specific dimensions of 
speech or speech style elements employed by the prior art. 
This LPC model also accounts for the relative phase of 
different frequencies, unlike most, if not all, of the known 
prior art. This broad model is then empirically correlated 
with a psychological or physiological assessment. This 
relatively full, yet still compact, characterization permits the 
system a great deal of flexibility in the types of assessments 
that may be carried out. 

The invention is also not location dependent. That is, the 
test subject does not need to be proctored by a test admin- 
istrator located within the same room. Rather, the speech 
acquisition circuitry may be located remotely from the 
signal processing circuitry that performs the comparison 
with the knowledge base. For example, the test subject’s 
speech may be digitized by the subject’s home computer and 
transmitted by modem (e.g., over the Internet) to a central 
location that provides remote physiological or psychological 
assessment services. The results are displayed on the home 
computer. This adaptation is easily implemented using exist- 
ing technology. 

Those skilled in the art will recognize that the present 
invention may be employed to associate speech parameters 
with not only psychological and physiological conditions, 
but any other condition present in an individual. This can be 
achieved as long as the correlation between a subject’s 
condition and the subject’s speech parameters can be veri- 
fied as significant through testing independent of the present 
invention. 

Note that all patents and other references cited herein are 
incorporated by reference herein in their entirety. 

Although the invention has been described in conjunction 
with particular embodiments, it will be appreciated that 
various modifications and alterations may be made by those 
skilled in the art without departing from the spirit and scope 
of the invention. For example, as mentioned above, a wide 
variety of well-known speech comparison techniques may 
be adapted for implementation in the present invention. The 
invention is not to be limited by the foregoing illustrative 
details, but rather is to be defined by the appended claims. 

What is claimed is: 

1. A psychological assessment system comprising: 


a knowledge base including at least one speech model 
corresponding to at least one psychological character- 
istic of a plurality of reference subjects, wherein the 
speech model is independent of a priori knowledge of 
style parameters; and 

signal processing circuitry for comparing the at least one 
speech model with test speech parameters of a test 
subject. 

2. The system of claim 1, wherein the at least one speech 
model is represented by a statistical time-ordered series of 
frequency representations of the speech of the reference 
subjects. 
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3. Тһе system of claim 1, further comprising: 

speech parameterization circuitry for generating the test 

parameters in response to the test subject’s speech. 

4. The system of claim 3, wherein the speech parameter- 
ization circuitry includes speech acquisition circuitry that is 
remote from the knowledge base. 

5. The system of claim 1 further comprising: 

output circuitry for outputting at least one indicator of a 

psychological characteristic in response to the compari- 
son. 

6. The system of claim 5, wherein the psychological 
characteristic is time-varying, and the output circuitry out- 
puts the indicator of the psychological characteristic in a 
time-varying manner. 

7. The system of claim 5, wherein the output circuitry 
further outputs a ranking of each output psychological 
characteristic. 

8. The system of claim 1, wherein the at least one 
psychological characteristic indicates degree of sincerity. 

9. The system of claim 8, wherein the degree of sincerity 
varies with time. 

10. The system of claim 1, wherein the signal processing 
circuitry compares psychological characteristics before and 
after treatment for a psychological disorder, wherein the 
compared psychological characteristics are generated by the 
comparison of the at least one speech model with the test 
speech parameters. 

11. A method for psychological assessment comprising 
the steps of: 

providing a knowledge base including at least one speech 

model corresponding to at least one psychological 
characteristic of a plurality of reference subjects, 
wherein the speech model is independent of a priori 
knowledge of style parameters; and 

comparing the at least one speech model with test speech 

parameters of a test subject. 

12. The method of claim 11, wherein the at least one 
speech model is represented by a statistical time-ordered 
series of frequency representations of the speech of the 
reference subjects. 

13. The method of claim 11, further comprising the step 
of: 

generating the test parameters in response to the test 

subject’s speech. 

14. The method of claim 13, further comprising the step 
of acquiring the test subject’s speech remotely from the 
knowledge base. 

15. The method of claim 11 further comprising the step of: 

outputting at least one indicator of a psychological char- 

acteristic in response to the comparison. 

16. The method of claim 15, wherein the psychological 
characteristic is time-varying, further comprising the step of 
outputting the indicator of the psychological characteristic in 
a time-varying manner. 

17. The method of claim 15, the outputting step further 
comprising the step of outputting a ranking of each output 
psychological characteristic. 

18. The method of claim 11, wherein the at least one 
psychological characteristic indicates degree of sincerity. 

19. The method of claim 18, wherein the degree of 
sincerity varies with time. 

20. The method of claim 11, further comprising the step 
of comparing psychological characteristics before and after 
treatment for a psychological disorder, wherein the com- 
pared psychological characteristics are generated by the 
comparison of the at least one speech model with the test 
speech parameters. 
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21. A physiological assessment system comprising: 

a knowledge base including at least one speech model 
corresponding to at least one physiological character- 
istic of a plurality of reference subjects, wherein the 
speech model includes information concerning the 
vocal tracts of the reference subjects; and 

signal processing circuitry for comparing the at least one 
speech model with test speech parameters of a test 
subject. 

22. The system of claim 21, wherein the at least one 
speech model is represented by a statistical time-ordered 
series of frequency representations of the speech of the 
reference subjects. 

23. The system of claim 21, wherein the at least one 
speech model is independent of a priori knowledge of style 
parameters. 

24. The system of claim 21, further comprising: 

speech parameterization circuitry for generating the test 
parameters in response to the test subject’s speech. 

25. The system of claim 24, wherein the speech param- 
eterization circuitry includes speech acquisition circuitry 
that is remote from the knowledge base. 

26. The system of claim 21 further comprising: 

output circuitry for outputting at least one indicator of a 
physiological characteristic in response to the compari- 
son. 

27. The system of claim 26, wherein the physiological 
characteristic is time-varying, and the output circuitry out- 
puts the indicator of the physiological characteristic in a 
time-varying manner. 

28. The system of claim 26, wherein the output circuitry 
further outputs a ranking of each output physiological char- 
acteristic. 

29. The system of claim 21, wherein the signal processing 
circuitry compares physiological characteristics before and 
after treatment for a physiological disorder, wherein the 
compared psychological characteristics are generated by the 
comparison of the at least one speech model with the test 
speech parameters. 

30. A method for physiological assessment comprising the 
steps of: 

providing a knowledge base including at least one speech 
model corresponding to at least one physiological char- 
acteristic of a plurality of reference subjects, wherein 
the speech model includes information concerning the 
vocal tracts of the reference subjects; and 

comparing the at least one speech model with test speech 
parameters of a test subject. 

31. The method of claim 30, wherein the at least one 
speech model is represented by a statistical time-ordered 
series of frequency representations of the speech of the 
reference subjects. 

32. The method of claim 30, wherein the speech model is 
independent of a priori knowledge of style parameters. 

33. The method of claim 30, further comprising the step 
of: 

generating the test parameters in response to the test 
subject's speech. 

34. The method of claim 33, further comprising the step 
of acquiring the test subject's speech remotely from the 
knowledge base. 

35. The method of claim 30 further comprising the step of: 

outputting at least one indicator of a physiological char- 
acteristic in response to the comparison. 

36. The method of claim 35, wherein the physiological 
characteristic is time-varying, further comprising the step of 
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outputtine the indicator of the physiological characteristic іп 
a time-varying manner. 

37. The method of claim 35, the outputting step further 
comprising the step of outputting a ranking of each output 
physiological characteristic. 

38. The method of claim 30, further comprising the step 
of comparing physiological characteristics before and after 
treatment for a physiological disorder, wherein the com- 
pared physiological characteristics are generated by the 
comparison of the at least one speech model with the test 
speech parameters. 

39. In a system for assessing at least one psychological or 
physiological characteristic of a test subject, a knowledge 
base comprising: 

at least one speech model corresponding to at least one 
characteristic, wherein the at least one speech model is 
independent of a priori knowledge of style parameters, 
and 

a statistical time-ordered series of frequency representa- 
tions of the speech of a plurality of reference subjects 
within the at least one speech model. 

40. The knowledge base of claim 39, wherein the at least 
one speech model includes information concerning the vocal 
tracts of the reference subjects. 

41. In a method for assessing at least one psychological or 
physiological characteristic of a test subject, a method for 
creating a knowledge base comprising the steps of: 

forming at least one speech model corresponding to at 
least one characteristic, wherein the at least one speech 
model is independent of a priori knowledge of style 
parameters, and 

generating a statistical time-ordered series of frequency 
representations of the speech of a plurality of reference 
subjects within the at least one speech model. 

42. The method of claim 41, wherein the at least one 
speech model includes information concerning the vocal 
tracts of the reference subjects. 

43. A physiological assessment system comprising: 

a knowledge base including at least one speech model 
corresponding to at least one physiological character- 
istic of a plurality of reference subjects, wherein the at 
least one speech model is independent of a priori 
knowledge of style parameters; and 

signal processing circuitry for comparing the at least one 
speech model with test speech parameters of a test 
subject. 

44. The system of claim 43, wherein the at least one 
speech model is represented by a statistical time-ordered 
series of frequency representations of the speech of the 
reference subjects. 

45. The system of claim 43, further comprising: 

speech parameterization circuitry for generating the test 
parameters in response to the test subject’s speech. 

46. The system of claim 45, wherein the speech param- 
eterization circuitry includes speech acquisition circuitry 
that is remote from the knowledge base. 

47. The system of claim 43, further comprising: 

output circuitry for outputting at least one indicator of a 
physiological characteristic in response to the compari- 
son. 

48. The system of claim 47, wherein the physiological 
characteristic is time-varying, and the output circuitry out- 
puts the indicator of the physiological characteristic in a 
time-varying manner. 

49. The system of claim 47, wherein the output circuitry 
further outputs a ranking of each output physiological char- 
acteristic. 
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50. The system of claim 43, wherein the signal processing 
circuitry compares physiological characteristics before and 
after treatment for a physiological disorder, wherein the 
compared psychological characteristics are generated by the 

5 comparison of the at least one speech model with the test 
speech parameters. 

51. Amethod for physiological assessment comprising the 
steps of: 

providing a knowledge base including at least one speech 

model corresponding to at least one physiological char- 

acteristic of a plurality of reference subjects, wherein 
the at least one speech model is independent of a priori 
knowledge of style parameters; and 

comparing the at least one speech model with test speech 

parameters of a test subject. 

52. The method of claim 51, wherein the at least one 
speech model is represented by a statistical time-ordered 
series of frequency representations of the speech of the 
reference subjects. 

53. The method of claim 51, further comprising the step 
of: 

generating the test parameters in response to the test 

subject’s speech. 

54. The method of claim 53, further comprising the step 
of acquiring the test subject’s speech remotely from the 
knowledge base. 

55. The method of claim 51, further comprising the step 
of: 

outputting at least one indicator of a physiological char- 

acteristic in response to the comparison. 

56. The method of claim 55, wherein the physiological 
characteristic is time-varying, further comprising the step of 
outputting the indicator of the physiological characteristic in 
a time-varying manner. 

57. The method of claim 55, the outputting step further 
comprising the step of outputting a ranking of each output 
physiological characteristic. 

58. The method of claim 51, further comprising the step 
of comparing physiological characteristics before and after 
treatment for a physiological disorder, wherein the com- 
pared physiological characteristics are generated by the 
comparison of the at least one speech model with the test 
speech parameters. 

59. In a system for assessing at least one physiological 
characteristic of a test subject, a knowledge base compris- 
ing: 

at least one speech model corresponding to each 

characteristic, wherein the speech model includes 

information concerning the vocal tracts of the reference 
subjects; and 

a Statistical time-ordered series of frequency representa- 

tions of the speech of a plurality of reference subjects 

within the at least one speech model. 

60. A system for assessing at least one psychological or 
physiological characteristic of a test subject, the system 
comprising: 

speech parameterization circuitry for generating test 

speech parameters in response to the test subject’s 

speech; and 

signal processing circuitry for comparing the test speech 

parameters with at least one speech model from a 

knowledge base, 

wherein the at least one speech model corresponds to at 

least one psychological or physiological characteristic 

of a plurality of reference subjects, and the at least one 
speech model is independent of a priori knowledge of 
style parameters. 
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61. А system for assessing at least one physiological 
characteristic of a test subject, the system comprising: 

speech parameterization circuitry for generating test 
speech parameters in response to the test subject’s 
speech; and 

signal processing circuitry for comparing the test speech 
parameters of with at least one speech model from a 
knowledge base, 


wherein the at least one speech model corresponds to at 
least one physiological characteristic of a plurality of 
reference subjects, and includes information concern- 
ing the vocal tracts of the reference subjects. 

62. A method for assessing at least one psychological or 
physiological characteristic of a test subject, the method 
comprising the steps of: 

generating test speech parameters in response to the test 

subject’s speech; and 

comparing the test speech parameters with at least one 

speech model from a knowledge base, 
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wherein the at least one speech model corresponds to at 
least one psychological or physiological characteristic 
of a plurality of reference subjects, and the at least one 
speech model is independent of a priori knowledge of 
style parameters. 
63. A method for assessing at least one physiological 
characteristic of a test subject, the method comprising the 
steps of: 


generating test speech parameters in response to the test 
subject’s speech; and 

comparing the test speech parameters with at least one 
speech model from a knowledge base, 


wherein the at least one speech model corresponds to at 
least one physiological characteristic of a plurality of 
reference subjects, and includes information concern- 
ing the vocal tracts of the reference subjects. 


